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In the case of translational noncrystallographic symmetry 
(tNCS), two or more copies of a component in the asymmetric 
unit of the crystal are present in a similar orientation. This 
causes systematic modulations of the reflection intensities in 
the diffraction pattern, leading to problems with structure 
determination and refinement methods that assume, either 
implicitly or explicitly, that the distribution of intensities is a 
function only of resolution. To characterize the statistical 
effects of tNCS accurately, it is necessary to determine the 
translation relating the copies, any small rotational differences 
in their orientations, and the size of random coordinate 
differences caused by conformational differences. An algo- 
rithm to estimate these parameters and refine their values 
against a likelihood function is presented, and it is shown that 
by accounting for the statistical effects of tNCS it is possible to 
unmask the competing statistical effects of twinning and tNCS 
and to more robustly assess the crystal for the presence of 
twinning. 
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1. Introduction 

There have been great advances in the methods available for 
macromolecular crystallography, such that a significant frac- 
tion of structure determinations are now relatively straight- 
forward. However, there is still the potential for serious 
complications when the crystals possess features that break 
the assumptions underlying the routine structure-solution 
pathways. The presence of translational noncrystallographic 
symmetry (tNCS) is particularly insidious in causing difficul- 
ties in all stages of crystal structure determination, from 
indexing the diffraction pattern to refining the structure. 

In tNCS, two or more crystallographically independent 
copies are in the same (or nearly the same) orientation in the 
unit cell. Their contributions to a structure factor have the 
same (or similar) amplitudes but have relative phases deter- 
mined by the projection of the translation vector on the 
diffraction vector. As a result, they interfere constructively for 
some reflections and destructively for others, so that there is a 
systematic modulation of the sum of their contributions. The 
most serious case is when the translation is approximately, but 
not exactly, equal to a potential lattice translation such as a 
centring operator or a cell doubling. The exact relationship is 
often broken by a small rotation (typically less than 10°) in 
addition to the translation. Such translations are referred to 
as pseudo-translations or pseudo-centrings because of their 
pseudo-crystallographic nature, and they lead to pronounced 
effects, with large numbers of systematically very weak and 
very strong reflections. The perturbation of the distribution of 
intensities leads to difficulties with statistical tests based on 
intensity statistics, as well as violating the assumptions behind 
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likelihood targets for phasing and refinement, which assume 
that the data follow an isotropic Wilson distribution. 

Translational NCS is a frequent issue in solved macro- 
molecular crystal structures. The frequency of tNCS has been 
investigated by Zwart et al. (2005). The existence of tNCS can 
be detected by the presence of a large non-origin Patterson 
peak. Using the criterion that a non-origin peak greater than 
20% of the origin peak was present in a Patterson map 
computed using data to 5 A resolution, it was found that about 
8% of structures deposited in the Protein Data Bank (PDB; 
Berman et al, 2000) probably possess tNCS. Translational 
NCS can also prevent structure solution, for which there are 
anecdotal accounts but no statistical records. 

In the following, the effect of tNCS on structure-factor 
intensity statistics is investigated. A method to characterize 
the parameters describing the tNCS has been developed and 
tested, and it is shown that corrected intensity statistics can be 
used to detect the presence of twinning. The implications for 
molecular replacement, experimental phasing and refinement 
will be explored in subsequent publications. 



2. Statistical effects of noncrystallographic symmetry 

A full maximum-likelihood treatment of NCS would cover the 
very general case of a number of different components that 
are related by different noncrystallographic symmetries. In 
practice, the NCS-related deviations in structure-factor 
intensities from an isotropic Wilson distribution are most 
serious when there is exact translational NCS or nearly exact 
translational NCS (a small rotation is present), particularly 
if these are translations close to crystallographic centring 
operators and if only one set of NCS operators is present. For 
this reason, and for simplicity of notation, we will only deal 
with the case where there is one set of NCS operators, 
although the formulae are presented in a way that may be 
generalized to multiple sets of operators. In order to deal with 
the very common case that the relationship is not a perfect 
translation but is rather a translation combined with a small 
rotation, we start with the case of NCS operations that 
combine translations with rotations of any size. 

2.1 . Covariance elements sensitive to the effects of 
noncrystallographic symmetry 

The statistical effects of NCS are easiest to evaluate by 
considering correlations between NCS-related contributions 
to the structure factors and then assembling them into a 
picture of the overall effects of NCS. 

As pointed out by Bricogne (1997), the presence of NCS 
leads to modulations in the intensities, which can be used to 
characterize the nature of the NCS. The following treatment of 
intensity statistics is similar in spirit to that of Bricogne, with 
the addition of an allowance for small random differences 
among the NCS-related copies in the positions and scattering 
factors of the atoms that make them up. As in Bricogne (1997) 
we will not consider correlations among structure factors, 



so the structure factors are all implicitly assumed to be for 
reflection h. 

Consider a crystal containing in its asymmetric unit two or 
more copies of components with similar structure. The total 
structure factor (F) is made up of contributions from copies 
related by a combination of iV ncs noncrystallographic and A^ sym 
crystallographic operations, 

2V syn N 

F = E E F *m> 
k=\ m=l 

N 

Vkm = Hfjm exp(27rih ■ X jkm ), (1) 

where 

X ,*m = T k[0~ 1 F y m O(x j + F S jm ) + F V m ] + t k 

= T,0- 1 f V m O(x y + F S jm ) + (T kF v m + t k ). 

In this, there is an allowance for differences in the scattering 
factors for atoms in different copies (f Jm could differ among 
NCS-related molecules m, particularly because of differences 
in the incorporated effects of B factors). The coordinates are 
represented in terms of those from a canonical copy of the 
molecule centred on the origin and conformational differences 
relative to that molecule ( F S ]m ). For convenience, we can take 
the canonical copy to be in the same orientation as the copy 
with k = m = 1, so that x ; - = x, n — F \ 1 — P 8j\ and F \ l is an 
identity matrix. Note that since conformational differences are 
assigned even to the first copy, the canonical copy can be 
considered to be an average structure. The number of atoms in 
one copy of the component is given by N. The NCS rotations 
could be represented in terms of one matrix, C, in the notation 
used by Bricogne (1997), but the physical meaning is easier to 
understand in terms of rotations ( F \ m ) in orthogonal space, so 
that the transformations from (O) and to (O -1 ) fractional 
coordinates must be included explicitly. The crystallographic 
symmetry operations are represented by a rotation matrix, T k , 
and a translation vector, t k . 

We start by considering the covariances among the contri- 
butions to the structure factor where (similar to the case of 
experimental phasing; Read, 2003) terms between common 
atoms will dominate, 

FhnK) ~ EifjJin exp[27rih ■ (X jkm - X// „)]) . (2) 

7=1 

For covariances involving atoms within the same copy (k = I 
for crystallographic symmetry and m = n for noncrystallo- 
graphic symmetry), we can consider the atoms to be inde- 
pendent because we have factored out any relationships 
leading to correlations, 

N 
7=1 ' 

If the expressions for the transformed coordinates are 
entered explicitly, the dot product inside the exponential in (2) 
can be expanded as follows: 
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+ (T* 



■ t*) - (T IF v„ + t,) 



(4) 



+ (T t O- 1 f V m O F S ;m - T^-VV^^)]. 

With some rearrangement and changes of variable, this can 
be expressed more succinctly: 



h ' ( X jkm X jln) 

where 



FFyklmn ' X j + ^ 



h " FF^jklmn' 



(5) 



(O r f VlO- ir Tj - O r f V„ r O- ir T, r )h, 



FF y klmn — i^kF^m + h) ~ C^lF^n + */)> 
FF&jklmn = T^O 1 F V m O F 8j m ~~ T ; 0 ^ F^rS^F^j, 



so that 

N 

(VkmK) - T,(fjJjn exp(2m J7i7 h Hm „ • x y ) exp(2jrih • FF \ klmn ) 

x exp(2jrih- FF S /Jtto J). (6) 

The first exponential term in (6) accounts for the effect of 
rotation on interference, with FF h ktmn being equal to the 
difference between two copies of the original index h rotated 
by different combinations of crystallographic and noncrys- 
tallographic symmetry in the crystal; the closer FF h k i mn is to 
zero, the larger the interference effect. The second exponen- 
tial accounts for a systematic translation-derived phase shift 
between the contributions of the two copies of the component. 
The third exponential (along with the scattering factors) 
accounts for the effects of differences among the NCS-related 
copies. Note that if the coordinate differences are considered 
to be drawn randomly from a spherically symmetric distribu- 
tion, then rotating these differences (e.g. in the variable F & jm ) 
will not change the nature of their probability distributions, so 
that the distribution of FF <5 yjHm „will be independent of k and /. 
(The subscripted prefix FF indicates terms relating two 
contributions to the observed structure factor, F, to distinguish 
them from terms relating contributions involving calculated 
structure factors, G. Such terms will be needed for subsequent 




Figure 1 

G-function computed from the Fourier transform of a sphere centred on 
the origin plotted as a function of the product of r and s, i.e. the ratio of 
the sphere radius and the resolution. 



work on applications to molecular replacement, experimental 
phasing and refinement.) 

For the covariances between copies related purely by 
crystallographic symmetry (m = n but k ^ I), the presence or 
absence of tNCS is not relevant. These terms will only differ 
significantly from zero when the symmetry rotation is parallel 
to the diffraction vector (T^h = Tfh, so that FF h klmn = 0). When 
there is no phase shift between the contributions of these 
copies, they will contribute to increasing the expected intensity 
factor; otherwise, they will lead to systematic absences. Such 
pairs of contributions can be handled in a simple fashion by 
setting the co variance terms for m = n, k ^ I to zero and then 
multiplying the remaining diagonal elements in the covariance 
matrix by the usual expected intensity factor s. 

The interesting covariances are those between copies 
related by noncrystallographic symmetry (m ^ n). If we 
assume that the differences in scattering factors and atomic 
positions are independent of the positions of the atoms within 
the components, then the expected value can be treated as a 
product of expected values, separating the correlation ( FF p mn ) 
of the structure factors for the components if they were in the 
same position and orientation from the interference effects, 



( ¥ kmK) ~ FFPm„( S Fm S F„) 1/2 ( eX P( 27r 'FF h «m„ ' X ,)> 

x (exp(2mh- FF y klmn )}, 



(7) 



where 



Fm Fn' 



1/2 



= ( JlfjJjn exp(2jr/h ■ FF S jklmn ) 



\i =1 



If there is an atomic model, then at least the approximate 
locations of the atoms in each component are known, so that 
the expected value of the rotational interference term can be 
computed. However, if we are characterizing translational 
NCS prior to structure solution, the best we will have is some 
idea of the envelope containing the component. In this case, 
the expected value of the interference term is an integral over 
the volume of the envelope (denoted U F for the volume of 
a unique component contributing to the structure factor F), 
which is equivalent to the Fourier transform of the envelope 
or a G-function (Rossmann & Blow, 1962). Because the 
envelope is finite in volume and does not possess crystallo- 
graphic symmetry, it is convenient to index it in terms of a 
diffraction vector (in units of A -1 ), 



(cxp(2m FF h klmn • x ; .)) = / exp(2m FF h klmn ■ x y ) 
u F 

— G F ( FF s k i mn ), 



(8) 



where 



FFyklmn 



= o 



= (fVlO 



FF u klmn 



-lTrpF 

It 



F v„ r o- 



r T, r )h. 



Before the shape of the molecule (or at least its orientation) 
is known, it may be appropriate to approximate it as a sphere 
with radius r, so that the G-function is the Fourier transform of 
a sphere (Rossmann & Blow, 1962), 
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G(r,\ FF s klmn \) 

3[sm(2nr\ FF s klmn \) - 2nr\ FF s klmn \ cos(2nr\ FF s klmn \)] 



(9) 



A G-function computed from a sphere centred on the origin 
(Fig. 1) gives insight into the general behaviour of the inter- 
ference term; the G-function differs significantly from zero 
only for values of FF s k / mn with a magnitude substantially less 
than the reciprocal of the sphere radius. G-functions from 
volumes with finer details in their shapes and lacking 
symmetry will also lack spherical symmetry and will have 
features extending to higher resolution, although the largest 
values will still be close to the origin. 

The argument of the G-function, FF s ktmm will be near zero 
either when the two corresponding copies of the structure 
component (related by combinations of crystallographic and 
noncrystallographic symmetry) are in nearly the same orien- 
tation or when the rotation axis is nearly parallel to the 
diffraction vector, so that 



lT Tlh : 



F v„ r o- 



1T Tfh. 



(10) 



The former condition will apply for all structure factors, 
leading to an overall modulation of the diffraction pattern, 
while the latter condition will lead to spikes in the diffraction 
pattern with a significant modulation (Bricogne, 1997). The 
maximum modulation along the direction of the spikes arising 
from this component of the symmetry would be equal to the 
number of copies in the asymmetric unit. However, the 
maximum would only be reached if the direction of the rota- 
tion axis coincided with the diffraction vector and if the 
disposition of the copies were such that they were equally 
spaced between the Bragg planes. In principle, knowing the 
directions of such spikes would contribute to understanding 
the rotational part of the NCS, and the pattern of intensity 
modulation along these spikes would give information about 
the relative positions of copies of components. However, this 
is a minor contribution to the overall modulation of the 
structure-factor intensities in the case of translational NCS. 
Including this term does not significantly alter the corrective 
factors, but does significantly increase the computation time 
(results not shown). In the remainder we will neglect the 
contribution to the covariances of copies in significantly 
different orientations. 

Although a noncrystallographic translation can be gener- 
ated by a combination of crystallographic symmetry and 
noncrystallographic symmetry (for example, a crystallographic 
twofold and a nearly parallel noncrystallographic twofold), we 
can choose without loss of generality to consider the copies 
related by noncrystallographic translations as belonging to 
the same asymmetric unit, so that k = I for the pairs we will 
consider; the covariance elements (F tm F*„) will be approxi- 
mated as zero for k ^ I. (As above, we deal with the case in 
which the symmetry rotation is parallel to the diffraction 
vector by multiplying included terms by the expected intensity 
factor e.) This leads to simplification of the expressions in the 
covariances, 



FFvkkmn ~ ^kKF^m ~ F V n)' 
FF S kkmn = (f^ m ~ fV„)0 »jfc ■ 
FF&jkkmn = TjfcO (pV m O F 8j m — F \ n O F Sj n ). 



T t r h, 



(11) 



Note that the phase-shift term containing FF \ kkmn now only 
depends on the translation vector between the NCS-related 
copies and not on the translational component of the crys- 
tallographic symmetry operators. This has the advantage that 
an analysis of the effects of tNCS can be carried out when the 
Laue group is known but not necessarily the particular space 
group. 

2.2. Effect of tNCS on the expected intensity of the observed 
structure factor 

Correlations among the components of the structure factor 
lead to systematic modulation of the observed intensities. 

The variance (expected intensity) of the structure factor 
that is the sum of the contributions of the different compo- 
nents is the sum of all of the covariances between these 
contributions. This is simplified by the fact that we are ignoring 
terms between different crystallographic symmetry operators 
and collecting their influence in the expected intensity factor s. 
To allow simply for the possibility of a part of the crystal that 
does not obey these NCS operators, we can add a term Y* Fr 
for the rest of the structure. (Note that T, Fr could include the 
contribution of another component with a different set of NCS 
operators, showing how the treatment presented here could 
easily be generalized.) 

(F 2 ) = eX Fr + £ j: £ E Fm + 2 £ FF p mn (^ Fm ^ Fn f /2 

k—1 m— 1 y n=m+l 



x Re[G F ( FF s kkmn ) exp(2jrih ■ FF \ kkmn )] 



(12) 



In this expression, terms with m < n have been paired with 
their complex conjugates, i.e. the terms with m > n, so that 
the imaginary parts cancel. The unmodulated terms can be 
collected into a term representing the intensity that would be 
expected after averaging over the modulations, E N , 



1/2 



W S ,m iV ncs N m 

(F 2 ) = £ £„ + 2s £ £ £ FF p mn (X Fm X Fn ) 

k=l m—1 n= m+1 

x Re[G F Us**™) exp(2jrih • FF v kkmn )] 
= e£ J 1 + 2 £ E £ FFpm " (ljFmljFn) 



l k=lm=\n=m+l 

x Re[G F ( FF s kkmn ) exp(2jrih ■ FF v kkmn )] j . (13) 

The term in the curly braces can be thought of as an extra s 
factor accounting for the modulation of the intensities by NCS. 

This general expression could be applied when there is an 
atomic model, which defines the envelope enclosing the parts 
of the structure that obey tNCS, and the rotations and trans- 
lations that relate these parts of the structure. Before the 
structure is solved, there is no way to know the shape of the 
envelope (or at least how it should be oriented, if there is a 
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molecular-replacement model), so it is simplest to assume a 
sphere, in which case the G-function is real and depends only 
on the resolution. This approach should capture the most 
important effects of tNCS even when there is a detailed atomic 
model, 



(F 



^ _|_ 2 r 7* r mn\ tm rn ' 
k=\ m=ln=m+l 



x G F(\ F F s k km n I) cos(2jrh • FF v ttm „) 



(14) 



For the very common special case in which there is only one 
translational NCS operator, the equation can be simplified 
further, 



(F 2 ) = eE J 1 + 2 F r ncs £ G F (| FF s ttl2 |) 



x cospjrTjh ■ (>! - F v 2 )] , 



(15) 



where 



FF p 12 (E F1 E F2 ) ^ 



In this form, the weight F r ncs applied to the modulation term 
is effectively the fraction of the scattering of one component in 
the unit cells that obeys the translational NCS, corrected for 
the effect of differences among tNCS-related copies. Note that 
this automatically allows the presence of a component that 
does not obey tNCS. 

3. Simulations to test the probability distributions 

The probability distributions describing the statistical effects 
of tNCS have been tested by simulations in Mathematica 




Figure 2 

Predicted average intensity in the direction parallel to c* for a crystal 
(space group PI, unit-cell parameters a = b = c = 50 A, a= fi = y = 90°) 
containing two copies [separated by a fractional translation of (0.47, 0.47, 
0.47), i.e. approximately body-centred] of a spherical molecule (r = 20 A) 
comprised of 200 single-electron point scatterers. The solid lines shows 
the case in which the two copies are identical in conformation but differ 
by a 5° rotation around the x axis (black line) or around the z axis (grey 
line). The dashed line shows the case in which the two copies are in the 
same orientation but have r.m.s. coordinate differences of 1.5 A. 



(v.8.0; Wolfram Research, Champaign, Illinois, USA). In these 
simulations, data have been generated for a crystal in space 
group PI containing two 'molecules' related by tNCS. For the 
first copy of the molecule, atoms were generated randomly 
within a sphere and copies of these atoms were then generated 
by applying a small rotation, a translation and a random shift. 
Since the molecules have a spherical envelope, the G-function 
is the Fourier transform of a sphere, as discussed by Rossmann 
& Blow (1962). The simulations show that accounting for the 
effects of orientation and conformation differences between 
tNCS-related copies will be essential to gain a good agreement 
between theory and observation. 

3.1. Modulations of observed intensities 

As described by (13), tNCS introduces a modulation of the 
expected intensities depending primarily on the phase shift of 
the contributions from copies related by tNCS. The modula- 
tion drops in strength if there are differences in the confor- 
mations or the orientations of the copies. Fig. 2 illustrates the 
effects of random coordinate differences (assumed to be 
drawn from a Gaussian distribution) and differences in 
orientation on the strength of modulation for structure factors 
obtained from a crystal with two spherical molecules. Note 
that when the model is complete and the two copies scatter 
with the same strength then the term F r ncs in (15) is equal to 
half of the complex correlation between these copies ffPvi- 
When the coordinate differences are drawn from a Gaussian 
distribution with an r.m.s. coordinate difference of a r , then this 
complex correlation can be calculated using the appropriate 
formula for a A , which is also a complex correlation (Read, 
1990), 




Figure 3 

Comparison of predicted average intensity (line) with simulated average 
intensity (points). The crystal is equivalent to that used for Fig. 2, except 
that the two copies differ by a rotation of 2° around the x axis and an 
r.m.s. coordinate difference of 0.5 A. Each point (corresponding to a 00/ 
reflection) is obtained by carrying out 1000 simulations in which 200 
atoms are generated randomly within the spherical envelope of the first 
molecule (centred on the origin); the second copy is then generated by 
perturbing these atomic positions followed by rotation and translation. 
The points for the first-order and second-order reflections are omitted 
because the assumptions behind the Wilson (1949) distribution are 
violated when the Bragg spacings are large compared with the size of the 
molecular envelope. 
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ffPu — ex P 



2tt 2 



(16) 



As shown in Fig. 2, random conformational differences and 
rotational differences between the copies can have a similar 
effect on the strength of the intensity modulation, except that 
there is a direction-dependence of the effect of the rotation 
difference: a rotation around the diffraction vector has no 
effect (because it does not change the positions of the atoms 
relative to the Bragg planes), whereas a rotation around an 
axis perpendicular to the diffraction vector has a large effect. 
This figure also shows that the information to distinguish the 
effects of random conformational differences and rotational 
differences may be most obvious at higher resolution. 

The simulation in Fig. 3 demonstrates that (15) provides an 
excellent description of the average intensities for different 
reciprocal-lattice vectors, even when there is a combination of 
conformational and orientation differences between the 
copies. 

4. Refining parameters characterizing tNCS 

To characterize tNCS from a data set, parameters describing 
the NCS translation, the difference in orientation of the 
tNCS-related copies and the random differences between the 
structures of the copies must be estimated and refined. This 
has been implemented with the following algorithm in Phaser 
(McCoy et al, 2007). The current implementation is optimized 
for the common case of two copies related by tNCS. Multiple 
tNCS copies can also be handled, as long as the copies are 
generated by successive applications of the same translation 
vector, but a more general treatment has not yet been 
implemented. The parameters characterizing the tNCS are 
refined against a likelihood function given by the Wilson 
(1949) distribution of amplitudes for acentric reflections, 

IF ( F 2 \ 
p i (F) = Tr ^exp[ 



(F 2 ) 



(F 2 ))' 



(17) 



or centric reflections, 



Pc(F) = 



x(F*)) 



V' 2 ( F 2 \ 



(18) 



In this likelihood function, the expected value of the 
intensity is computed using (14), so the refined parameters are 
the parameters from that equation. 

An initial estimate of the translation vector between the 
two copies (or the first two of successive copies), j?Vi — F \ 2 , is 
obtained from the largest off-origin peak in a native Patterson 
map. If the translation is close to a centring operator, 
symmetry-related copies of the Patterson peak will merge into 
a single peak on a special position. Refinement would not be 
able to move this translation vector to one of the equidistant 
symmetry copies so, if the Patterson peak is on a special 
position, the translation vector is first perturbed by a small 
translation of d mi J6 in each of the x, y and z directions; we 
have found this to be sufficient to avoid the refinement being 
trapped on an exact centring translation. 



A refinement of the relative orientation is carried out if 
there are two copies related by tNCS; for multiple copies, we 
currently approximate the effect of rotational differences as 
random differences among copies related by a pure transla- 
tion. Because the orientation refinement does not always 
converge uniquely from any starting point, refinements are 
started from several relative orientations and that giving the 
best agreement with the data is chosen. The rotational 
difference between the two copies is parameterized as a 
combination of small rotations about the x, y and z axes, which 
behave well in refinement because they are approximately 
orthogonal. Note that when the exact shape and size of the 
molecule that obeys tNCS is not known, there is a trade-off 
between the assumed radius of the sphere that approximates 
the molecular envelope and the size of the rotation angles. The 
rotational difference enters the likelihood target through the 
G-function term, which depends on the amount by which the 
rotational difference rotates the diffraction vector. For small 
rotations, the absolute size of the movement of the diffraction 
vector is, to a good approximation, proportional to the rota- 
tion angle, so an error in the assumed sphere radius can be 
compensated by a reciprocal change in the size of the rotation 
angle. 

Finally, the complex correlation between pairs of tNCS- 
related copies ( FF p mn in 14) is currently assumed to be 
equivalent for all pairs when there is more than one NCS 
translation, and we do not currently account for the possibility 
of different overall B factors among the copies. In this case, we 
can refine the resolution-dependent parameter F r ncs assumed 
to be equivalent for all pairs of tNCS-related copies. In Phaser 
this is reported as a Luzzati D factor (Luzzati, 1952). In fact, 
the refined parameter is given by the corresponding variance 
term 



- 1 - r 2 

— ± F t ncs> 



(19) 



which has better refinement properties, as the likelihood 
function is more nearly quadratic when expressed in terms of 
this parameter. 

5. Intensity moments in the presence of tNCS 

Intensity moments can be a useful diagnostic for the presence 
of twinning (Stanley, 1972; Rees, 1980), but their usefulness 
can be reduced by other influences on the distribution of 
intensities, such as overall anisotropy and, in particular, tNCS 
(Padilla & Yeates, 2003; Lebedev et al, 2006). Corrections for 
overall anisotropy are now well established (Popov & Bour- 
enkov, 2003; McCoy et al, 2007). We were interested in 
determining whether a further correction for the statistical 
effects of tNCS would at least partially unmask the statistical 
effects of twinning. 

£-values that have been corrected for the statistical effects 
of tNCS can be computed using the expression for the 
expected intensity in (14), 

(20) 



(F 2 ) 



1/2 
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Table 1 

Second moments of intensity ({E 4 )!{E 2 ) 2 ) in the presence and absence of twinning. 

Before anisotropy correction Before tNCS correction After tNCS correction 

PDB code Centric Acentric AB aniso (A 2 ) Centric Acentric Centric Acentric Twin fraction p-value 



2.81 3.01 2.01 -t 1 

2.88 2.73 1.97 -t 0.221 

1.88 - 1.75 0.08 1.4 x f0~ 2( 

2.38 2.56 1.81 0.10 3.6 x 10"" 

1.84 2.52 1.71 0.46 2.1 x f0~ 7 ' 



2fuq 


5.04 


3.10 


29.0 


4.42 


fun7 


4.33 


2.84 


4.5 


4.44 


fy9r 




1.88 


0.2 




leh4 


2.45 


2.35 


0.0 


2.45 


lupp 


3.05 


1.84 


2.3 


3.04 



f No merohedral or pseudomerohedral twin operator possible. 



and then these £-values can be used in the standard moment 
tests. 

Several test data sets were selected from the PDB for 
structures with pairs of molecules or assemblies in the asym- 
metric unit related by tNCS: PDB entries 2fuq (Shaya et al, 
2006), lun7 (Vincent et al, 2004), ly9r (Fagart et al, 2005), 
leh4 (Mashhoon et al, 2000) and lupp (Karkehabadi et al, 
2003). These cases were chosen to illustrate the effects of 
anisotropy, twinning and small rotational deviations from a 
pure translation. One of these cases, lupp, was also chosen by 
Lebedev et al. (2006) to illustrate the effect of combining 
twinning and tNCS. 

Table 1 shows the results that are obtained by computing 
second intensity moments for centric and acentric reflections 
before and after correction for overall anisotropy and for the 
effects of tNCS. Note that if the data obey standard Wilson 
distributions the expected value for this moment is 3 for 
centric reflections and 2 for acentric reflections, but in the 
presence of perfect twinning the moments would be reduced 
to 2 for centric reflections and 1.5 for acentric reflections 
(Stanley, 1972). To assess the significance of any deviation 
from the values expected for untwinned data, a p-value is also 
shown; this p-value is the probability (computed from the 
observed distribution of intensities) that the true value of the 
second moment for the acentric reflections is 2 or greater. In 
Phaser, a p-value of 0.001 or less triggers a warning that the 
crystal is likely to be twinned. 

As an objective measure of twinning, the twin fraction 
obtained by twin refinement in phenix.refine (Afonine et al, 
2012) is shown for the structures in cells that support mero- 
hedral or pseudomerohedral twinning. In addition, Table 2 
compares the refined values for the tNCS operators with the 
values determined from the deposited models to allow an 
assessment of the simplified model of the crystal used to 
characterize tNCS. 

These tests demonstrate that the correction for the statis- 
tical effects of tNCS can indeed unmask the statistical effects 
of twinning. The p-values for twinned crystals are significantly 
lower than the threshold of 0.001 even when the twin fraction 
is as low as about 0.1. However, for the case of nearly perfect 
twinning in lupp, the second moment is 1.71, which is signif- 
icantly larger than the value of 1.5 that would be expected for 
perfect twinning. This may, at least in part, be because the 
molecular assembly differs significantly from the assumed 
spherical shape with a radius of about 33 A; it is a U-shape 
fitting into a box of approximately 88 x 54 x 42 A. More 



Table 2 

Comparison of estimated and refined tNCS operators. 





Rotation angle (°) 


Translation vector (fractional) 


PDB 






Angular 






code 


Refined 


PDB 


differencet 


Refined 


PDBt 


2fuq 


0.33 


0.89 


0.78 


-0.038, 0.497, 0.000 


-0.038, 0.499, 0.000 


lun7 


2.05 


2.52 


f.05 


0.487, 0.500, 0.500 


0.482, 0.499, 0.500 


ly9r 


2.49 


1.24 


2.60 


0.325, 0.662, 0.589 


0.324, 0.662, 0.589 


leh4 


3.63 


4.00 


1.24 


0.009, 0.007, 0.493 


0.002, 0.010, 0.493 


lupp 


4.01 


3.43 


2.93 


0.004, -0.496, 0.494 


0.007, -0.498, 0.496 



t Angular difference measured using the symmetry-related transformation that agrees 
most closely with the NCS translation in the PDB file and choosing the (arbitrary) 
direction of rotation that minimizes the angular difference. % PDB translation vector 
measured as a vector between centres of mass of common main-chain atoms 



importantly, the twin-related reflections in this case will be 
affected by different modulations, so that the model of the 
effects of tNCS will be a compromise. In lupp the two mole- 
cules are related by a translation of approximately 0, 1/2, 1/2 
and a rotation of 3.43° about an axis very nearly parallel to the 
y axis. The largest modulations will therefore be seen for 
reflections with small h and / indices, for which the rotation has 
very little effect on scattering. However, the twin law is k, h, 
—I, so that reflections near the hQQ axis, with large values of 
the h index and thus relatively little modulation, are super- 
imposed on reflections near the OfcO axis with significant 
modulation. 

The results in Table 2 show that the method is able to detect 
deviations from exact centring operators, even when the 
Patterson peaks merge into a single peak consistent with a 
perfect centring operation. The refined translation vectors 
agree well with the vectors determined from the refined 
models. Also, even though the assumption of spherical mole- 
cules is not necessarily obeyed well, the refined rotations are 
correlated to the true rotations. The rotations are determined 
more accurately when the translations are closer to centring 
operators. In this situation, more of the reflections are affected 
by strong modulations, so that there is more signal from which 
the rotational parameters can be deduced. 

To test whether it is important to model the rotational 
difference between pairs of tNCS-related molecules, or 
whether the refinement of the Luzzati D parameters can 
compensate, we repeated the test calculations for two of the 
crystals that showed a significant rotational difference, lun7 
and leh4, but not allowing the modelled rotation to refine 
away from zero. For lun7, the mean value of the second 
moment of the intensity was 2.25, compared with 1.97 when 
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the rotation was modelled. For leh4, the second moment 
without refining the rotational parameters was 1.94, compared 
with 1.81. Note that a second moment of 1.94 does not differ 
significantly from the value of 2 expected for an untwinned 
crystal, with a p-value of 0.148. These results demonstrate that 
it is indeed important to model the rotational differences when 
characterizing tNCS. 

6. Conclusions 

This analysis has shown that the effects of tNCS depend on 
the exact values of the translation, which can be estimated 
precisely, and on small differences in orientation between the 
NCS-related copies, which can be given better than random 
estimates even under conditions where the simplifying 
assumptions of spherical molecules are not valid. By taking 
account of the statistical effects of tNCS, the statistical effects 
of twinning can be unmasked sufficiently to provide a clear 
diagnostic for twinning. This is important in practice because 
tests that depend on twin laws rely on having the symmetry 
correctly assigned (Lebedev et al, 2006). If the data have been 
merged with too high symmetry these tests cannot be applied, 
but if the data have been merged with too low symmetry then 
these tests will generate false positives. Note that when the 
symmetry is correctly assigned, tests such as the L-test (Padilla 
& Yeates, 2003) are preferable for their ability to assess the 
twin fraction reasonably reliably. In the application of the 
L-test, reflections with indices differing by even numbers are 
typically chosen to minimize the statistical effects of tNCS 
arising from pseudo-centring (Padilla & Yeates, 2003); 
however, when the tNCS differs from a pseudo-centring 
operation it may be helpful to correct for the statistical effects 
of tNCS before applying the L-test. 

In future work, we will show how this understanding of the 
statistical effects of tNCS can be used to improve methods for 
molecular replacement, phasing by single -wavelength anom- 
alous diffraction and structure refinement. 
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